Implementation of a Parallel Radix Sort on Cell Processor

نویسنده

  • Luong D. Hung
چکیده

Cell Broadband Engine (CBE) possesses tremendous raw computation power. Nevertheless, developing programs that can fully exploit that computation power is challenging. Cell Speed Challenge 2007 gives its contestants the opportunity to strive for the highest performance of data sorting on Cell. This work describes the design and implementation of a parallel radix sort. Data buffering, scalar code optimizations are employed to make the algorithm work effectively on Cell. Evaluation results show that significant performance can be achieved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Partitioned Parallel Radix Sort

Load balanced parallel radix sort solved the load imbalance problem present in parallel radix sort. By redistributing the keys in each round of radix, each processor has exactly the same number of keys, thereby reducing the overall sorting time. Load balanced radix sort is currently known as the fastest internal sorting method for distributed-memory multiprocessors. However, as the computation ...

متن کامل

PARALLEL SORTING AND MOTIF SEARCH By SHIBDAS BANDYOPADHYAY A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy PARALLEL SORTING AND MOTIF SEARCH By Shibdas Bandyopadhyay August 2012 Chair: Sartaj Sahni Major: Computer Engineering With the proliferation of multi-core architectures, it has become increasingly important to design versions of popular...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

Keys Per Processor ( n / p ) Radix SortBitonic Sort Sample Sort Simple Radix Sort

We have developed a methodology for predicting the performance of parallel algorithms on real parallel machines. The methodology consists of two steps. First, we characterize a machine by enumerating the primitive operations that it is capable of performing along with the cost of each operation. Next, we analyze an algorithm by making a precise count of the number of times the algorithm perform...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007